More uses of exchangeability: representations 
of complex random structures 



David J. Aldous" 



Abstract 

We review old and new uses of exchangeability, emphasizing the general 
theme of exchangeable representations of complex random structures. 
Illustrations of this theme include processes of stochastic coalescence and 
fragmentation; continuum random trees; second-order limits of distances 
in random graphs; isometry classes of metric spaces with probability 
measures; limits of dense random graphs; and more sophisticated uses 
in finitary combinatorics. 
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1 Introduction 



Kingman's write-up [4J| of his 1977 Wald Lectures drew attention to the 
subject of exchangeability, and further indication of the topics of interest 
around that time can be seen in the write-up [sj of my 1983 Saint- 
Flour lectures. As with any mathematical subject, one might expect 
some topics subsequently to wither, some to blossom and new topics to 
emerge unanticipated. This Festschrift paper aims, in informal lecture 
style, 

(a) to recall the state of affairs 25 years ago (sections I2.1H2.31 13. ip : 

(b) to briefly describe three directions of subsequent development that 
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have recently featured prominently in monographs 16, 4^ 5^ (sec- 
tions HH [01121) ; 
(c) to describe 3 recent rediscoveries, motivated by new theoretical top- 
ics outside mainstream mathematical probability, of the theory of 
representations of partially exchangeable arrays (sections 12. 5[ 15. If 



(d) to emphasize a general program that has interested me for 20 years. 
It doesn't have a standard name, but let me here call it exchangeable 
representations of complex random structures (section |4]) . 

The survey focusses on mathematical probability; although the word 
Bayesian appears several times, I do not attempt to cover the vast territ- 
ory of explicit or implicit uses of exchangeability in Bayesian statistics. 



except to mention here its use in hierarchical models [20|, 

This article is very much a bird's-eye view. Of the monographs men- 
tioned above, let me point out Pitman's Combinatorial Stochastic Pro- 
cesses [5^ , which packs an extraordinary number of detailed results into 
200 pages of text and exercises. Exchangeability is a recurring theme in 
(s^ l. which covers about half of the topics we shall mention (and much 



more, not related to exchangeability), and so [531 is a natural starting 



place for the reader wishing to get to grips with details. 



2 Exchangeability 

2.1 de Finetti's theorem 

I use exchangeability to mean, roughly, 'applications of extensions of de 
Finetti's theorem'. Let me assume only that the reader is familiar with 
the definition of an exchangeable sequence of random variables 

(Zi, i > 1) = (Z^(i), i > I) for each finite permutation tt 

and with the common verbal statement of de Finetti's theorem: 

An infinite exchangeable sequence is distributed as a mixture of 
i.i.d. sequences. 

Scanning the graduate-level textbooks on measure-theoretic probability 
on my bookshelves, the theorem makes no appearance in about half, a 



brief appearance in others 18|, [3l[, [46[ and only three 22[, [35|, [42| devote 



much more than a page to the topic. 
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A remarkable feature of de Finctti's theorem is that there are many 
ways to state essentially the same result, depending on the desired em- 
phasis. This feature is best seen when you state results more in words 
rather than symbols, so that's what I shall do. Take a nice space S, and 
either don't worry what 'nice' means, or assume S* = M. Write V{S) for 
the space of probability measures on S. Write n for a typical element of 
■p(S') and write a for a typical random element of ViS), that is a typical 
random measure. When we define an infinite exchangeable sequence of 
S-valued random variables we are really defining an exchangeable meas- 
ure (9, say) on V{S°"), where G is the distribution of the sequence. 

Functional analysis viewpoint. The subset = /j, x fj, x fx x . . . : 

H € 'P{S)} C V{S°°) is the set of extreme points of the convex set of 
exchangeable elements of V{S°°), and the identification 



JV{S) 

gives a bijection between probability measures A on V{S) (that is, A e 
'P{'P{S))) and exchangeable measures O in 'P{S°°). 

Probability viewpoint. Here are successively more explicit versions 
of the same idea. Let {Zi, 1 < i < oo) be exchangeable 5- valued. 

(a) Conditional on the tail (or invariant or exchangeable) cr-field of the 
sequence (Z,), the random variables are i.i.d. 

(b) There exists a random measure a such that, conditional on a = fi, 
the random variables Zi are i.i.d. (/i). 

(c) Giving 'P(S) the topology of weak convergence, the empirical meas- 
ure Fn = F„(a;, ■) = n~^J2^=i^{Zi{u)e-) converges a.s. to a limit 
random measure a{LJ,-) satisfying (b). 

Theoretical statistics viewpoint. In contexts where a frequentist 
would assume data are i.i.d. (/z) from an unknown distribution n, a Bayes- 

ian would put a prior distribution A on possible fi; so de Finetti's the- 
orem is saying that the Bayesian assumption is logically equivalent to the 
assumption that the data {Zi,i > 1) are exchangeable. Note a mathem- 
atical consequence. There is a posterior distribution A„(w, •) G 'P{'P{S)) 
for A given {Zi,. . . , Zn), and an extension of (c) above is 




M°°(-) A(dM) 



(d) A„(a;,-) ^ a.s. in V{V{S)). 
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Such results are historically often used as a starting point for philo- 
sophical and mathematical discussion of consistency/inconsistency of 
frequentist and Bayesian methods, inspired of course by Bruno de Finetti 
himself. 



But there's more! Later we encounter at least two further, somewhat 
different, viewpoints: explicit constructions fsection l2.7l) . and our central 
theme of using exchangeability to describe complex structures (sections 
[3] and [3]). This theme is related to the general features that 

• exchangeable-like properties are preserved under weak convergence; 

• parallel to representation theorems for infinite exchangeable-like struc- 
tures, are convergence theorems giving necessary and sufficient condi- 
tion for finite exchangeable-like structures to converge in distribution 
to an infinite such structure. 

In the setting of de Finetti's theorem, the condition for finite exchange- 
able sequences converge in distribution to 
an infinite exchangeable sequence X is 

an ^ a on ViS) 

where a is the 'directing' random measure for X in (b) above, and is 
the empirical distribution of {x["'^ , • • ■ , ) . Note that when we talk 
of convergence in distribution to infinite sequences or arrays, we mean 
w.r.t. product topology, i.e. convergence of finite restrictions. 



2.2 Exchangeability, 25 years ago 



Here I list topics from the two old surveys [4J, |3[, for the purpose of 
saying a few words about those topics I will not mention further, while 
pointing to sections where other topics will be discussed further. 

Classical topics not using de Finetti's tiieorem. 

(a) Combinatorial aspects for classical stochastic processes, e.g. ballot 



theorems: [56 1. 



(b) Weak convergence for 'sampling without replacement' processes (e.g. 



171 Thm 24.1) 
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Variants of de Finetti's theorem. Several variants were already clas- 
sical, for instance: 

(c) Schoenberg'^ theorem ([3[(3.6)) for the special case of spherically 
symmetric sequences; 

(d) the analogous representation ([3] (3.9)) in the setting of two se- 
quences (Xi, I < i < oo; Yj,l < j < oo) whose joint distribution is 
invariant under finite permutations of either; 

(e) the selection property ( [i^ p. 188), that the exchangeability hypo- 
thesis in de Finetti's theorem can be weakened to the assumption 

(Xi, X2, ■ ■ ■ Xn) = {Xk-^, Xk2, ■ ■ ■ , Xk„) 

for all 1 < fci < fc2 < ■ • • < fcn • 
Other variants had been developed in the 1970s, for instance: 

(f) the an alog for continuous-time processes with exchangeable incre- 



ments 



4Q|; 



(g) Kingman's paintbox theorem for exchangeable random partitions; 
see section [3?T1 



Finite versions. The general forms of de Finetti's theorem and some 
classical variants can be proved by comparing sampling with and without 
replacement. This method 2J] also yields finite-n variants. 



Mathematical population genetics, the coalescent and the Pois- 
son— Dirichlet distribution. Exchangeability is involved in this large 
circle of ideas, developed in part by Kingman in the 1970s, which contin- 
ues to prove fruitful in many ways. For the population genetics aspects 



of Kingman's work see the article by Ewens and Watterson [33| in this 
volume; also the Kingman coalescent fits into the more general stochastic 
coalescent material in section [ 



The subsequence principle. The idea emerged in the 1970s that, 
from any tight sequence of random variables, one can extract a sub- 
sequence which is close to exchangeable, close enough that one can prove 
analogs of classical limit theorems (CLT and LIL, for instance) for the 
subsequence. General versions of this principle were established in [l|,[lB|, 
which pretty much killed the topic. 

^ Persi Diaconis observes that the result is hard to deduce from Schoenberg [54|| 
and should really be attributed to Freedman |34| . 
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Sufficient statistics and mixtures of Markov chains. One can 

often make a Bayesian interpretation of 'sufficient statistic' in terms of 
some context-dependent invariance property |25i]. Somewliat similarly, 
one can cliaracterize mixtures of Markov chains via the property that 
transition counts are sufficient statistics 23|. 



2.3 Partially exchangeable arrays 

The topic, emerging around 1980, oi partially exchangeable arrays, plays 
a role in what follows and so requires more attention. Take a measurable 
function / : [0,1]'^ S which is symmetric, in the sense f{x,y) = 
f{y,x). Take {Ui,i > 1) i.i.d. Uniform(0, 1) and consider the array 

^{.,,} :-/(C/.,C/,) (2.1) 

indexed by the set N(2) of unordered pairs {i,j}- The exchangeability 
property of (Ui) implies what we shall call the partially exchangeable 
property for the array: 

for each finite permutation n. (2.2) 

Note this is a weaker property than the 'fully exchangeable' property for 
the countable collection {X^i j-^ , {i, j} G N(2)), because the permutations 
of N(2) which are of the particular form {i,j} — t- {7r(i), 7r(j)} for a finite 
permutation tt of N are only a subset of all permutations of N(2)- 

Aside from construction (|2.ip . how else can one produce an array with 
this partially exchangeable property? Well, an array with i.i.d. entries 
has the property, and so does the trivial case where all entries are the 
same r.v. After a moment's thought we realize we can combine these 
ideas as follows. 

Take a function / : [0, 1]^ — S such that f{u, ui, U2, U12) is symmetric 
in (ui,U2), and then define 

:=/([/,t/„ [/„[/{,,,)) (2.3) 

where all the r.v.s in the families U, {Ui,i £ N), {U^ij-j, {i,j} G N(2)) are 
i.i.d. Uniform(0, 1). Then the resulting array X = {X^ijy) is partially 
exchangeable. 

Finding oneself unable to devise any other constructions, it becomes 
natural to conjecture that every partially exchangeable array has a rep- 
resentation (in distribution) of form (12. 3p . This was proved by Hoover 
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proof having been substantially simplified due to a personal communic- 
ation from Kingman. 

Constructions of partially exchangeable arrays appear in Bayesian 
statistical modeling; see e.g. the family of copulae introduced in [l^ 
in the context of a semi-parametric model for Value at Risk. 



2.4 Fast forward 

Such partially exchangeable representation theorems were the state of the 
art in the 1984 survey [3|]. They were subsequently extended systemat- 
ically by Kallenberg, both for arrays and analogs such as exchangeable- 
increments continuous-parameter processes, and for the rotatable matri- 
ces to be mentioned in section [^?El during the late 1980s and early 1990s. 
The whole topic of representation theorems is the subject of Chapters 



7-9 of Kallenberg's 2005 monograph [43|. Not only does this monograph 
provide a canonical reference to the theorems, but also its introduction 
provides an excellent summary of the topic. 
In the particular setting above we have 

Theorem 2.1 (Partially Exchangeable Representation Theorem) An 
array X which is partially exchangeable, in the sense (li^.^p . has a rep- 
resentation in the form 



This is one of the family of results described carefully in Chapter 7 
of [i^. There are analogous results for higher-parameter arrays [Xijk), 
and for arrays in which the 'joint exchangeability' assumption (j2.2l) is 
replaced by a 'separate exchangeability' assumption for non-symmetric 
arrays {Xi_j, I < i^j < oo): 

(X^j, 1 < i,j < oo) = (X^j(i)^^2Q-), 1 < i, j < oo) 

for finite permutations tti, 7r2. 

One aspect of this theory is surprisingly subtle, and that is the issue 
of uniqueness of representing functions /. In representation (j2.3p . if we 
take Lebesgue-measure-preserving maps (pQ, <f>i, 02 from [0,1] to [0,1], 
then the arrays X and X* obtained from / and from /*(u, ui, U2, U12) '■— 
/((/)o(u), 01 (ui), (/'i(m2), 02(^12)) must have the same distribution. But 
this is not the only way to make arrays have the same distribution: 
there are other ways to construct measure-preserving transformations of 
[0, 1]^, and (because measure-preserving transformations are not invert- 
ible in general) one needs to insert randomization variables. (I thank 
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the referee for correcting a blunder in my first draft, and for the com- 
ment "this may be a major reason why non-standard analysis is effective 
here".) For an explicit statement of the uniqueness result in two dimen- 



4l| and for higher dimensions see [42 



sions see 

Relative to the Big Picture of Mathematics, this theory of partial ex- 
changeability was perhaps regarded during 1980-2005 as a rather small 
niche inside mathematical probability — and ignored outside mathemat- 
ical probability. So it is ironic that around 2004-8 it was rediscovered 
in at least three different contexts outside mainstream mathematical 
probability. Let me say one such context right now and the others later 
(sections 15.11 and [ 



2.5 Isometry classes of metric spaces with probability 

measures 

The definition of isometry between two metric spaces (5*1, di) and (5*2, 
contains an 'if there exists . . . ' expression. Asking for a charac- 
terization of metric spaces up to isometry is asking for a scheme that 
associates some notion of 'label' to each metric space in such a way that 
two metric spaces are isometric if and only if they have the same label. 
I am not an expert on this topic, but I believe there is no known such 
characterization. 

But suppose instead we consider 'metric spaces with probability meas- 
ure', (^i, di, fii) and {S2, ^2, and require the isometry to map fii to 
fj,2- It turns out there is now a conceptually simple characterization. 
Given {S,d,^), take i.i.d.(/x) random elements (^^,1 < i < 00) of S, 
form the array (of form (12. ip ) 

{^J}GN(2) (2.4) 

and let ^ be the distribution of the infinite random array. It is obvious 
that, for two isometric 'metric spaces with probability measure', we get 
the same and the converse is a simple albeit technical consequence of 
the uniqueness part of Theorem 12. 1[ implying: 

'metric spaces with probability measure' are characterized 

up to isometry by the distribution ^. (2.5) 

This result was given by Vershik [s?! , as one rediscovery of part of the 
general theory of partial exchangeability. 
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2.6 Rotatable arrays and random matrices with 
symmetry properties 

In Theorem l2.1l we described X = (Xij) as an array instead of a matrix^ 
partly because of the extension to higher-dimensional parametrizations 
and partly because we never engage matrix multiplication. Now regard- 
ing X as a matrix, one can impose stronger 'matrix-theoretic' assump- 
tions and ask for characterizations of the random matrices satisfying 
such assumptions. One basic case, rotatable matrices, is where the n x n 
restrictions are invariant in distribution under the orthogonal group, 
and the characterization goes back to 31 ■ Two other cases (I thank the 
referee for suggesting (ii) and the subsequent remark) are 



(i) non- negative definite jointly exchangeable arrays: jSOl. l51j: 



(ii) rotatable Hermitian matrices (50[, motivated indirectly by prob- 



lems in quantum mechanics and thereby related to the huge liter- 
ature on semicircular laws. 

Returning to the basic case of rotatable matrices, for the higher- 
dimensional analogs the basic representations are naturally stated in 
terms of multiple Wiener-Ito integrals, which form the fundamental ex- 
amples of rotatable random functionals. Such multiple Wiener-Ito integ- 
rals are also a basic tool in (49|, a subject with important applications 
to analysis. 



2.7 Revisiting de Finetti's theorem 

Returning to a previous comment, the theory of partially exchangeable 
representation theorems reminds us that one can take a similar view of 
de Finetti's theorem itself, to add to the list in section \TJ\ 

Construction viewpoint. Given a measurable function / : [0, 1]^ ^> S* 

and i.i.d. Uniform(0, 1) random variables {U;Ui,i > 1), the process 
{Zi,i > 1) defined by Zi ~ f{U,Ui) is exchangeable, and every ex- 
changeable process arises (in distribution) in this way from some /. 



3 Using exchangeability to describe complex 

structures 

Here is my attempt at articulating the first part of the central theme of 
this paper. 
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One way of examining a complex mathematical structure is to sam- 
ple i.i.d. random points and look at some form of induced substruc- 
ture relating the random points. 

The idea being that the i.i.d. sampling induces some kind of 'exchange- 
ability' on the distribution of the substructure, when the substructure 
is regarded as an object in its own right. 

The 'isometry' result (j2.5p nicely fits this theme — the substructure is 
simply the induced metric on the sampled points. The rest of the present 
paper seeks to illustrate that this, admittedly very vague, way of looking 
at structures can indeed be useful, conceptually and/or technically. Let 
us mention here two prototypical examples (which will reappear later) 
of what a 'substructure' might be. Given k vertices . . . , v{k) in a 

graph, one can immediately see two different ways to define an induced 
substructure. 

(i) The induced subgraph on vertices 1, . . . , fc: there is an edge (i, j) 
iff the original graph has an edge {v{i)^v{j)). 

(ii) The distance matrix: d{i,j) is the number of edges in the shortest 
path from v{i) to v{j). 

But before considering graph theoretic examples, let us explain with 
hindsight how Kingman's work on exchangeable random partitions fits 
this theme. 



3.1 Exchangeable random partitions and Kingman's 
paintbox theorem 



The material here is covered in detail in Pitman [52| Chapters 2-4. 

Given a discrete sub-probability distribution, one can write the prob- 
abilities in decreasing order as pi > P2 > • • • > and then write 
^(oo) •= 1 ^ '^jPj > to define a probability distribution p. Ima- 
gine objects 1, 2, 3, . . . each independently being colored, assigned color 
j with probability pj or assigned with probability P{oo) a unique color 
(different from that assigned to any other object). Then consider the 
resulting 'same color' equivalence classes as a random partition of N. So 
a realization of this process might be 

{1,5,6,9,13,...}, {2,3,8,11,15,...}, {4}, {7,23,...}, ; (3.1) 



sets in the partition are either infinite or singletons. This paintbox{p) 
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distribution on partitions is exchangeable in the natural sense. King- 
man's paintbox theorem, an analog of de Finetti's theorem, states that 
every exchangeable random partition of N is distributed as a mixture 
over p of paintbox(p) distributions. 

We mentioned in section [^TT] as a general feature that, associated with 
a representation theorem like this, there will be a convergence theorem. 
Here are two slightly different ways of looking at the convergence the- 
orem in the present setting. Suppose that for each n we are given an 
arbitrary random (or non-random) partition G{n) of {l,2,...,n}. For 
each k < n sample without replacement k times from {l,2,...,n} to 
get Un{l), Un{k), consider the induced partition on the sampled 
elements J7„(l), • ■ • , Un{k), and relabel these elements as 1, . . . , fc to 
get a random partition S{n, k) of {1, 2, . . . , k}. This random partition 
iS(n, fc) is clearly exchangeable. If there is a limit 

S{n,k) -4 5fe as n — cxD (3.2) 

(the set of all possible partitions of {l,2,...,fc} is finite, so there is 
nothing technically sophisticated here) then the limit Sk is exchangeable; 
and if a limit (j3.2D exists for all fc then the family (5^,1 < k < oo) 
specifies the distribution of an exchangeable random partition of N, to 
which Kingman's paintbox theorem can be applied. 

The specific phrasing above was chosen to fit a general framework 
in section 14.11 later, but here is a more natural phrasing. For any ran- 
dom partition of {1, ... , n} write F*^"-* — {F^^^\ i^2"\ . . .) for the ranked 
empirical frequencies, the numbers x (sizes of sets in partition) in de- 
creasing order. For a paintbox(p) distribution the SLLN implies F*'"-' 
p a.s., and so Kingman's paintbox theorem implies that for any infinite 
exchangeable random partition H, the limit F^"-* F exists a.s. and is 
the 'directing random measure' (conditional on F = p the distribution 
of n is paintbox(p)). Now suppose for each n we have an exchangeable 
random partition II'^"^ of {1,2,..., n} and write F*-"-* for its ranked em- 
pirical frequencies. The convergence theorem states that the sequence 
n(») converges in distribution (meaning its restriction to {1, . . . , fc} con- 
verges, for each fc) to some limit 11, which is necessarily some infinite 
exchangeable random partition with some directing random measure F, 
if and only if F*") = {Fj"\ I < j < oo) A F ^{Fj, l<j< oo). 

A final important idea is size-biased order. In the context of exchange- 
able random partitions this just means writing the components in a real- 
ization, as at (13. ip . starting with the component containing element 1, 



12 David J. Aldous 

then the component containing the least element not in the first compon- 
ent, and so on. In the infinite case, the frequencies F* = (F*, F2 , . . .) of 
the components in size-biased order are just a random permutation of the 
frequencies F given by Kingman's paintbox theorem. In the paintbox(p) 
case, replacing non-random F = p by random F* is perhaps merely com- 
plicating matters, but in the general case of random F it is often more 
natural to work with the size-biased order than with the ranked order. 
For instance, the size-biased order codes information such as 

E(Fi*)'" = E^F/"+\ 
i>i 

I am highlighting these 'structural' results as part of my overall theme, 
but in many ways the concrete examples are more interesting. The one- 
parameter Poisson-Dirichlet(0) family was already recognized 25 years 
ago as a mathematically canonical family of measures arising in several 
different contexts: the Ewens sampling formula in neutral population 
genetics, the 'Chinese restaurant process' construction, a construction 
via subordinators, the size-biased order of asymptotic frequencies is the 
GEM distribution; and special cases arise as limits of component sizes in 
random permutations and in random mappings. Subsequently the two- 
parameter Poisson~-Dirichlet(a, 6) distribution introduced by Pitman- 



Yor 53| was shown to possess many analogous properties. The paper 
(37I 1 in this volume gives the fiavor of current work in this direction. 

Now partitions are rather simple structures, and the paintbox theorem 
(which can be derived from de Finetti's theorem) isn't so convincing as 
an illustration of the theme 'using exchangeability to describe complex 
structures'. The theme becomes more visible when we consider the more 
complex setting of partitions evolving over time, and this setting arises 
naturally in the following context. 



3.2 Stochastic coalescence and fragmentation 

The topic of this section is treated in detail in Bertoin [16j], the third 
monograph in which exchangeability has recently featured prominently. 
The topic concerns models in which, at each time, unit mass is split into 
clusters of masses {xj}. One studies models of dynamics under which 
clusters split (stochastic fragmentation) or merge (stochastic coalescence 
or coagulatioTiQ) according to some random rules. 

^ The word coagulation, introduced in German in |55|I . sounds strange to the 
native EngUsh speaker to whom it suggests blood clotting; coalescence seems a 
more apposite English word. 
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Conceptually, states are unordered collections {xj} of positive num- 
bers with Ylij^j = 1- What is a good mathematical representation of 
such states? The first representation one might devise is as vectors in 
decreasing order, say x-l- = (xi, a;2, • • •)• ^^ut this representation has two 
related unsatisfactory features; fragmenting one cluster forces one to 
relabel the others; and given the realizations at two times, one can't 
identify a particular cluster at the later time as a fragment of a partic- 
ular cluster at the earlier time. These difficulties go away if we think 
instead in terms of sampling 'atoms' and tracking which cluster they are 
in. A uniform random atom will be in a mass-a; cluster of a configuration 
■x.^ with probability x; sampling atoms i = 1, 2, 3, . . . and taking the par- 
tition of {1, 2,3,.. .} into 'atoms of the same cluster' components gives 
an exchangeable random partition 11 with paintbox(x-'') distribution. 

Thus instead of representing a process as (X'''(t),0 < t < oo) we 
can represent it as a partition- valued process (n(t),0 <t< oo) which 
tracks the positions of (i.e. the clusters containing) particular atoms. 
For fixed t, both \l{t) and X~'-(i) give the same information about the 
cluster masses — and note that clusters in 11 (t) automatically appear in 
size-biased order. But as processes in (n(t),0 < t < oo) gives more 
information than (X.^{t),0 <t< oo), and in particular avoids the un- 
satisfactory features mentioned above. 

Now in one sense this is merely a technical device, but I find it does 
give some helpful insights. 

The basic general stochastic models. In the basic model of stochas- 
tic fragmentation, different clusters evolve independently, a mass-x clus- 
ter splitting at some stochastic rate A^; into clusters whose relative 
masses {xj/x, j > 1) follow some probability distribution ij,x{-)- (So the 
model neglects detailed 3-dimensional geometry; the shape of a cluster is 
assumed not to affect its propensity to split, and different clusters do not 
interact). Especially tractable is the self-similar case where j^x = Mi and 
Xx = x" for some scaling exponent a. Such processes are closely related 
to classical topics in theoretical and applied probability- the log-masses 
form a continuous time branching random walk, and the mass of the 
cluster containing a sample atom forms a continuous-time Markov pro- 
cess on state space (0, 1]. 

The basic model for stochastic coalescence is to have n particles, ini- 
tially in single particle clusters of masses 1/n, and let clusters merge 
according to a kernel k{x, x') indicating the rate (probability per unit 
time) at which a typical pair of clusters of masses x and x' may merge. 



14 



David J. Aldous 



For fixed n this is just a finite-state continuous-time Markov chain, but 
it is natural to study n ^ oo limits, and there are two different regimes. 
On the time-scale where typical clusters contain 0(1) particles, i.e. have 
mass 0(l/n), there is an intuitively natural hydrodyamical limit (law 
of large numbers) , that is differential equations for the relative propor- 
tions yi(t) of j-particle clusters in the n — > oo limit. This Smoluchowski 
coagulation equation has a long history in several areas of science such 
as physical chemistry, as indicated in the survey Q. Recent theoretical 
work has made rigorous the connection between the stochastic and de- 
terministic models, and part of this is described in [16|] Chapter 5. A 
different limit regime concerns the time-scale when the largest clusters 
contain order n particles, i.e. have mass of order 1. In this limit we have 
real- valued cluster sizes evolving over time (— c», c») and 'starting with 
dust' at time — cx), that is with the largest cluster mass ^> as t ^> — oo 
(just as, in the basic fragmentation model, the largest cluster mass 
as t ^ +00 ) and 1 ds, t ^ -l-oo. (So these models incidentally 
provide novel examples within the classical topic of entrance boundaries 
for Markov processes). 

Finally recall Kingman's coalescent, as a model of genealogical lines 
of descent within neutral population genetics, which (with its many sub- 
sequent variations) has become a recognized topic within mathematical 
population genetics — see e.g. Wakeley [HI]. Although the background 
story is different, it can mathematically be identified with the constant 
rate {k{x,x') = 1) stochastic coalescent in the present context. 

Discussion and special cases. There are three settings above (frag- 
mentation; discrete-particle coalescence; continuous-mass coalescence) 
which one might formalize differently, but the advantage of the 'ex- 
changeable random partition' set-up is that each can be represented as 
a partition-valued process {Il{t)). Intuitively, coalescence and fragment- 
ation are time-reversals of each other, and it is noteworthy that 

(i) there are several fascinating examples of special models where a 
precise duality relation exists and is useful (see e.g. section 14.41 

(iv)); 

(ii) but there seems to be no general precise duality relationship within 
the usual stochastic models. 

In the general models the processes (n(<)) are all Markov, as processes 
on partitions of N. One can consider their restrictions (Ilkit)) to parti- 
tions of {1, . . . , fc}, i.e. consider masses of the components containing k 
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sampled atoms. In general {Ilk{t)) will not be Markov, but it is Markov 
in special cases, which are therefore particularly tractable. One such case 
([l6[ section 3.1) is homogeneous fragmentation, where each cluster has 
the same splitting rate (A^: = Ai). Another such case Chapter 4) 
is the elegant general theory of exchangeable coalescents, which elim- 
inates the 'only binary merging' aspect of Kingman's coalescent, and 
is interpretable as n oo limit genealogies of more general models in 
population genetics. 

4 Construction of, and convergence to, infinite 
random combinatorial objects 

4.1 A general program 

de Finetti's theorem refers specifically to infinite sequences. Of course 
we can always try to view an infinite object as a limit of finite objects, 
and in the '25 years ago' surveys such convergence ideas were explicit 
in the context of weak convergence for 'sampling without replacement' 
processes, in finite versions such as f24|, and in some other contexts, such 
as Kingman's theory of exchangeable random partitions. I previously 
stated the first part of our central theme as 

One way of examining a complex mathematical structure is to sam- 
ple i.i.d. random points and look at some form of induced substruc- 
ture relating the random points 

which assumes we are given the complex structure. But now the second 
and more substantial part of the theme is that we can often use ex- 
changeability in the construction of complex random structures as the 
n ^ oo limits of random finite n-element structures G(n). 

Within the n-element structure Q{n) pick k random elements, look 
at the induced substructure on these k elements — call this S{n, k). 
Take a limit (in distribution) as n — >■ (X) for fixed k, any necessary 
rescaling having been already done in the definition of S{n, k) — call 
this limit Sk- Within the limit random structures {Sk, 2 < k < oo), 
the k elements are exchangeable, and the distributions are consist- 
ent as k increases and therefore can be used to define an infinite 
structure Soa- 

Where one can implement this program, the random structure Soo will 
for many purposes serve as a n — > oo limit of the original n-element 
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structures. Note that Soa makes sense as a rather abstract object, via 
the Kolmogorov extension theorem, but in concrete cases one tries to 
identify it with some more concrete construction. 



4.2 First examples 

To invent a name for the program above, let's cah it exchangeable rep- 
resentations of complex random structures. Let me first mention three 
examples. 

1. Our discussion (section l3.ll) of exchangeable random partitions fits 
the program but is atypically simple, in that the limit Soc is visibly 
the same kind of object (an exchangeable random partition) as is the 
finite object G{n)- But when we moved on to coalescence and frag- 
mentation processes in section 13.21 our 'exchangeability' viewpoint 
prompts consideration of the limit process as the partition-valued 
process (!!(<)), which is rather different from the finite-state Markov 
processes arising in coalescence for finite n. 

2. A conceptually similar example arises in the technically more sophist- 
icated setting of measure- valued diffusions (fJ-t)- In such processes the 
states are probability measures /i on some type-space, representing a 
'continuous' infinite population. But one can alternately represent /z 
via an infinite i.i.d. sequence of samples from fj,, and thereby repres- 
ent the state more directly as a discrete countable infinite population 

i > 1) and the process as a particle process {Zt{i), i > 1). This 
viewpoint was emphasized in the look- down construction of Kurtz- 
Donnelly [iilii]. 



3. As a complement to the characterization (|2.5p of metric spaces with 
a probability measure (p.m.), we can define a notion of convergence 
of such objects, say of finite metric spaces with Hn) to 

a continuous limit (6*00, doo, /^oo)- The definition is simply that we 
have weak convergence 

of the induced random arrays defined as at (j2.4p : 

for i.i.d. (^",i > 1) with distribution /x„. This definition provides 
an intriguing complement to the more familiar notion of Gromov- 
Hausdorff distance between compact metric spaces. 
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Let us move on to the fundamental setting where the program gives 
a substantial payoff, the formalization of continuum random trees as 
rescaled limits of finite random trees. 



The material here is from the old survey [J] . 

Probabilists are familiar with the notion that rescaled random walk 
converges in distribution to Brownian motion. Now in the most basic 
case — simple symmetric RW of length n — we are studying the uniform 
distribution on a combinatorial set, the set of all 2" simple walks of 
length n. So what happens if we study instead the uniform distribution 
on some other combinatorial set? Let us consider the set of n-vertex 
trees. More precisely, consider either the set of rooted labeled trees (Cay- 
ley's formula says there are n"~^ such trees), or the set of rooted ordered 
trees (counted by the Catalan number i Cn-i) ) ' write Tn for the 
uniform random tree. 

Trees fit nicely into the 'substructure' framework. Vertices w(l), . . . , 
v{k) of a tree define a spanning (sub)tree. Take each maximal path 
{wqjWi, . . . ,we) in the spanning tree whose intermediate vertices have 
degree 2, and contract to a single edge of length £. Applying this to k 
independent uniform random vertices from a n- vertex model Tn, then 
rescaling edge-lengths by the factor n~^/^, gives a tree we'll call S{n, k). 
We visualize such trees as in Figure HTTl vertex v{i) having been relabeled 
as i. 



4.3 Continuum random trees 
















6 







2 



Figure 4.1 A leaf-labeled tree with edge-lengths. 
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In the two models for Tn mentioned above, one can do explicit cal- 
culations of the distribution of S{n,k), and use these to show that in 
distribution there is an n oo limit Sk which (up to a model-dependent 
scaling constant we'll ignore in this informal exposition) is the following 
distribution. 

(i) The state space is the space of trees with k leaves labeled 1, 2, 
. . . , k and with unlabeled degree-3 internal vertices, and where the 
2fc — 3 edge-lengths are positive real numbers. 

(ii) For each possible topological shape, the chance that the tree has 
that particular shape and that the vector of edge-lengths (Li, . . . , 
L2k-3) is in {[li, li + dli], 1 < i <2k — 3) equals s exp{—s'^ /2)dli . . . 
dhk-s, where s = J2i ^i- 

One can check from the explicit formula what must be true from the 
general program, that for fixed k the distribution is exchangeable (in 
labels 1, . . . , k), and the distributions are consistent as k increases (that 
is, the subtree of S^+i spanned by leaves 1, . . . , A; is distributed as Sk)- 

So some object iSoo exists, abstractly — but what is it, more concretely? 
A real tree is a metric space with the 'tree' property that between any 
two points V and w there is a unique path. This implicitly specifies a 
length measure A such that the metric distance d(v, w) equals the length 
measure of the set of points on the path from v to w. When a real tree is 
equipped with a mass measure of total mass 1, representing a method 
for picking a vertex at random, I call it a continuum tree. We will con- 
sider random continuum trees — which I call continuum random trees or 
CRTs because it sounds better! — and the Portmanteau Theorem below 
envisages realizations of Soc as being equipped with a mass measure. 

Returning to the n- vertex random tree models Tn, by assigning 
'mass' 1/n to each vertex we obtain the analogous 'mass measure' on the 
vertices, used for randomly sampling vertices. The next result combines 
existence, construction and convergence theorems. The careful reader 
will notice that some details in the statements have been omitted. 

The Portmanteau Theorem Q, 0] 

1. Law of spanning subtrees. There exists a particular Brownian 
CRT which agrees with Soo in the following sense. Take a realiza- 
tion of the Brownian CRT, then pick k i.i.d. vertices from the mass 
measure, and consider the spanning subtree on these k vertices. The 
unconditional law of this subtree is the law in (ii) above. 
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2. Construction from Brownian excursion. Consider an excursion- 
type function / : [0, 1] -J- [0, oo) with /(O) = /(I) = and f{x) > 
elsewhere. Use / to define a continuum tree as follows. Define a 
pseudo-metric on [0, 1] by: d{x,y) = f{x) + f{y) — 2min(/(M) : x < 
u < y), X < y. The continuum tree is the associated metric space, 
and the mass measure is the image of Lebesgue measure on [0, 1]. 
Using this construction with standard Brownian excursion (scaled 
by a factor 2) gives the Brownian CRT. 

3. Line-breaking construction. Cut the line [0, oo) into finite seg- 
ments at the points of a non-homogeneous Poisson process of in- 
tensity X{x) = X. Build a tree by inductively attaching a segment 
[xi, Xi+i] to a uniform random point of the tree built from the earlier 
segments. The tree built from the first k — 1 segments has the law 
(ii) above. The metric space closure of the tree built from the whole 
half-line is the Brownian CRT, where the mass measure is the a.s. 
weak limit of the empirical law of the first k cut-points. 

4. Weak limit of conditioned critical Galton Watson branching 
processes and of uniform random trees. Take a critical Galton- 
Watson branching process where the offspring law has finite non-zero 
variance, and condition on total population until extinction being 
n. This gives a random tree. Rescale edge-lengths to have length 

Put mass 1/n on each vertex. In a certain sense that can 
be formalized, the n — > oo weak limit of these random trees is the 
Brownian CRT (up to a scaling factor) . This result includes as special 
cases the two combinatorial models Tn described above. 



4.4 Complements to the continuum random tree 

More recent surveys by Le Gall [i^ and by Evans 32] show different 
directions of development of the preceding material over the last 15 
years. For instance 

(i) the Brownian snake [i^ , which combines the genealogical structure 
of random real trees with independent spatial motions. 



(ii) Diffusions on real trees: [S^l Chapter 7. 

(iii) Continuum-tree valued diffusions. There are several natural ways 
to define Markov chains on the space of n- vertex trees such that the 
stationary distribution is uniform. Since the n — >■ oo rescaled limit 
of the stationary distribution is the Brownian CRT, it is natural to 
conjecture that the entire rescaled process can be made to converge 
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to some continuum-tree valued diffusion whose stationary distribu- 
tion is tfie Brownian CRT. But this forces us to engage a question 
that was dehberately avoided in the previous section: what exactly 
is the space of all continuum trees, and when should we consider 
two such trees to be the same? This issue is discussed carefully 
in 32[ , based on the notion of the Gromov-Hausdorff space of all 



compact spaces. Two specific continuum-tree valued diffusions are 
then studied in Chapters 5 and 9 of 



32 . 



(iv) Perhaps closer to our 'exchangeability' focus, a surprising aspect of 
CRTs is their application to stochastic coalescence. For < A < oo 
split the Brownian CRT into components at the points of a Poisson 
process of rate A along the skeleton of the tree. This gives a vector 
F(A) = {Yi{X), 12(A), . . .) of masses of the components, which as 
A increases specifies a fragmentation process. Reversing the dir- 
ection of time by setting A — e~* provides a construction of the 
(standard) additive coalescent [sj, that is the stochastic coalescent 
(section l3.2p with kernel K{x,y) = x + y 'started from dust'. This 
result is non-intuitive, and notable as one of a handful of precise 
instances of the conceptual duality between stochastic coalescence 
and fragmentation. Also surprisingly, there are different ways that 
the additive coalescent can be 'started from dust', and these can 
also be constructed via fragmentation of certain inhomogeneous 
CRTs ■ This new family of CRTs satisfies analogs of the Port- 
manteau Theorem, and in particular there is an explicit analog of 
the formula (ii) in section for the distribution of the subtree Sk 
spanned by k random vertices Q . This older work is complemented 
by much current work, the flavor of which can be seen in 



(v) A function F : {1, . . . , n} — {1, . . . , n} defines a directed graph 
with edges {i,F{i)), and the topic random mappings studies the 
graph derived from a random function F. One can repeat the sec- 
tion UTT] general program in this context. Any k sampled vertices 
define an induced substructure, the subgraph of edges i F{i) 
F{F{i)) —;>... reachable from some one of the sampled vertices. 
Analogously to Figure 14. 1[ contract paths between sampled ver- 
tices/junctions to single edges, to obtain (in the n — 00 limit) 
a graph Sk with edge-lengths, illustrated in Figure 14.21 The the- 
ory of n — > 00 limits of random mappings turns out to be closely 
related to that of random trees; the approach based on studying 
the consistent family {Sk, k > 1) was developed in Aldous-Pitman 
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Figure 4.2 The substructure 5^ of a random mapping. 



4.5 Second-order distance structure in random networks 

In the context (section 1431) of continuum random trees the substructure 
was distances between sampled points. At first sight one might hope that 
in many models of size-n random networks one could repeat that ana- 
lysis and find an interesting limit structure. But the particular feature 
of the models in section 14.31 is 'first order randomness' of the distance 
Dn between two random vertices; Dn/EDn has a non-constant limit 
distribution, leading to the randomness in the limit structure. Other 
models tend to fall into one of two categories. For geometric networks 
(vertices having positions in R^; route-lengths as Euclidean length) the 
route-length tends to grow as constant c times Euclidean distance, so any 
limit structure reflects only the randomness of sampled vertex positions 
and the constant c, not any more interesting properties of the network. 
In non-geometric (e.g. Erdos-Renyi random graph) models, Dn tends to 
be first-order constant. So counter-intuitively, we don't know any other 
first-order random limit structures outside the 'somewhat tree-like' con- 



Understanding second-order behavior in spatial models is very chall- 
enging — for instance, the second order behavior of first passage percola- 
tion times remains a longstanding open problem. But one can get second 
order results in simple 'random graph' type models, and here is the basic 
example (mentioned in Aldous and Bhamidi Q as provable by the meth- 
ods of that paper). The probability model used for a random n- vertex 
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network Qn starts with the coraplete graph and assigns independent Ex- 
ponential(rate 1/n) random lengths = Lji = Lf. to the (2) edges 
e = In this model EDn = \ogn + 0(1) and var(£>„) ~ 0(1), 

and there is second-order behavior — a non-constant limit distribution 
for Dn — logn. 

Now fix fc > 3 and write Dn{i,j) = Dn for the distance between 
vertices i and j. We expect a joint limit 

(i^„(l,2)-logn,...,A.(l,fc)-logn) A {D{1,2), . . . , D{1, k)) (4.1) 

and it turns out the limit distribution is 

{D{l,2),...,D{l,k)) ^ (ei+ryi2,...,a+'7ifc) 

where ^1 has the double exponential distribution 

^{S, < x) ^ cxp(-e^'^), -00 < a; < 00, 

the rjij have logistic distribution 

P(?7 < x) — Y^jF, —00 < a; < 00 

and (here and below) the r.v.s in the limits are independent. Now we 
can go one step further: we expect a joint limit for the array 

{Dn{i,j)-\ogn,l<t<j<k) 4 {D{i,j),l<i <j <k) 

and it turns out that the joint distribution of the limit is 

iD{iJ), l<i<j<k) = (e. -f ~^^J^<^<J<k) 

where the limit r.v.s all have the double exponential distribution. Of 
course the limit here must fit into the format of the partially exchange- 
able representation theorem (Theorem 12. ip , and it is pleasant to see an 
explicit function /. 



5 Limits of finite deterministic structures 

Though we typically envisage limiting random structures arising as lim- 
its of finite random structures, it also makes sense to consider limits of 
finite deterministic structures. Let me start with a trivial example. Sup- 
pose that for each n we have a sequence bn.i, ■ ■ ■ , bn,n of n bits (binary 
digits), and write Pn for the proportion of Is. For each k and n, sample 
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k random bits from . . . , and call the samples Xn,i, ■ ■ ■ , Xn^k- 
Then, rather obviously, the property 

{Xn^i, . . . , Xn^k) {k independent Bernoulli(p)) 

as n — oo; for each k 

is equivalent to the property p„ — > p. 

Now in one sense this illustrates a big limitation to the whole program 
— sampling a substructure might lose most of the interesting information 
in the original structure! But a parallel procedure in the deterministic 
graph setting (next section) does get more interesting results, and more 
sophisticated uses are mentioned in section [521 

5.1 Limits of dense graphs 

Suppose that for each n we have a graph G„ on n vertices. Write p„ 
for the proportion of edges, relative to the total number (2) of possible 
edges. We envisage the case pn p ^ (0, 1). 

For each n let {Un,i, i > 1) be i.i.d. uniform on 1, . . . , n. Consider the 
infinite {0, l}-valued matrix X": 

X"^ = l{{Uns, Unj) is an edge of Gn). 

When n :s> k^ the k sampled vertices {Un,i, ■ ■ ■ ,Un,k) of G„ will be 
distinct and the k x k restriction of X" is the incidence matrix of the 
induced subgraph S{n,k) on these k vertices. Suppose there is a limit 
random matrix X: 

X" A X as n ^ 00 (5.1) 

in the usual product topology, that is 

{Xl'j, l<i,j <k) A (Xjj, 1 < i,j < fc) for each k. 

(Note that by compactness there is always a subsequence in which such 
convergence holds.) Now each X" has the partially exchangeable prop- 
erty (12. 2p . and the limit X inherits this property, so we can apply the 
representation theorem (Theorem 12. ip to describe the possible limits. In 
the {0, l}-valued case we can simplify the representation. First consider 
a function of form (12. 3p but not depending on the first coordinate — that 
is, a function f{ui,Uj,u^i jy). Write 

q{ui,Uj) =P{f{ui,Uj,uiij^) = 1). 

The distribution of a {0,l}-valued partially exchangeable array of the 
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special form f{Ui,Uj,U^i_jj) is determined by the symmetric function 
•), and so for the general form (|2.3p the distribution is specified by 
a probability distribution over such symmetric functions. 

This all fits our section 14.11 general program. From an arbitrary se- 
quence of finite deterministic graphs we can (via passing to a sub- 
sequence if necessary) extract a 'limit infinite random graph' Soo on 
vertices 1, 2, . . . , defined by its incidence matrix X in the limit (jS.ip . 
and we can characterize the possible limits. But what is a more concrete 
interpretation of the relation between iSoo and the finite graphs (Gn)? 
To a probabilist the verbal expression of (|5.1I) 



the restriction Sk of Soa to vertices 1, . . . , fc is distributed as the 
n — > oo limit of the induced subgraph of Gn on k random vertices 

is clear enough, but here is a translation into more graph-theoretic lan- 
guage, following 26!]. For finite graphs F, G write t{F, G) for the propor- 
tion of all mappings from vertices of F to vertices of G that are graph 
homomorphisms, i.e. map adjacent vertices to adjacent vertices. Suppose 
F has k vertices, and we label them arbitrarily as 1, . . . , k. Take the 
subgraph G[k] of G on fc randomly sampled vertices, labeled 1, . . . , fc, 
and note that whether we sample with or without replacement makes no 
difference to n ^> oo limits. Then G) is the probability that F is a 
subgraph of G[k]. Now write t=(F, G) for the probability that F — G[k]. 
For fixed fc, a standard inclusion-exclusion argument shows that, for a 
sequence (G„), the existence of either family of limits 

limi(F, Gn) exists, for each graph F on vertices {1, . . . , fc}, (5.2) 

n 

limt=(F, Gn) exists, for each graph F on vertices {1, . . . , fc}, (5.3) 

n 

implies existence of the other family of limits. 

In our program, the notion of Soo being the limit of G„ was defined 
by (j5.1|) . which is equivalent to requiring existence of limits (j5.3p for 
each fc, in which case the limits are just Et=(i^, 5^). And as indicated 
above, the partially exchangeable representation theorem (Theorem 12. II) 
characterizes the possible limit structures Soo ■ A recent line of work in 



graph theory, initiated by Lovasz and Szegedy |48j . started by defin- 
ing convergence in the equivalent way via (15. 2p and obtained the same 
characterization. This is the second recent rediscovery of special cases of 
partially exchangeable representation theory. Diaconis and Janson [26j | 
give a very clear and detailed account of the relation between the two 



settings, and Diaconis-Holmes-Janson [27| work through to an expli- 
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cit description of the possible limits for a particular subclass oferaphs 
called threshold graphs. Of course the line of work started in [48| has 
been developed further to produce new and interesting results in graph 
theory — see e.g. 



21|. 



5.2 Further uses in finitary combinatorics 

The remarkable recent survey by Austin gives a more sophisticated 
treatment of the theory of representations of jointly exchangeable arrays, 
with the goal ( I4I section 4) of clarifying connections between that the- 
ory and topics involving limits in finitary combinatorics, such as those 
in our previous section. I don't understand this material well enough to 
do more than copy a few phrases, as follows. Section 4.1 of [l2:] gives 
a general discussion of 'extraction of limit objects', somewhat parallel 
to our section 14.11 but with more detailed discussion of different pos- 
sible precise mathematical structures. The paper continues, describing 
connections with the 'hypergraph regularity lemmas' featuring in com- 
binatorial proofs of Szemeredi's Theorem, and with the structure theory 
within ergodic theory that Furstenberg developed for his proof of Sze- 
meredi's Theorem. A subsequent technical paper Austin-Tao [l^ applies 
such methods to the topic of hereditary properties of graphs or hyper- 
graphs being testable with one-sided error; informally, this means that 
if a graph or hypergraph satisfies that property 'locally' with sufficiently 
high probability, then it can be modified into a graph or hypergraph 
which satisfies that property 'globally'. 



6 Miscellaneous comments 

To get an idea of the breadth of the topic. Mathematical Reviews 
created an 'exchangeability' classification 60G09 in 1984, which has 
attracted around 300 items; Google Scholar finds around 350 cita- 
tions of the survey Q; and the overlap is only around 50%. The 
topics in this paper, centered around structure theory — theory and 
applications of extensions of de Finetti's theorem — are in fact only a 
rather small part of this whole. In particular the 'exchangeable pairs' 



idea central to Stein's method [lj| is really a completely distinct field. 
Our central theme involved exchangeability, but one can perhaps view 
it as part of a broader theme: 
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a mathematical object equipped with a probabihty measure is some- 
times a richer and more natural structure than the object by itself. 

For instance, elementary discussion of fractals like the Sierpihski 
gasket view the object as a set in R^, but it comes equipped with 
its natural 'uniform probability distribution' which enables richer 
questions — the measure of small balls around a typical point, for 
example. Weierstrass's construction of a continuous nowhere differ- 
entiable function seems at first sight artificial — where would such 
things arise naturally? — but then the fact that the Brownian motion 
process puts a probability measure on such functions indicates one 
place where they do arise naturally. Analogously the notion of real 
tree (section 14. 3p may seem at first sight artificial — how might such 
objects arise naturally? — but then realizing they arise as limits of 
random finite trees indicates one place where they do arise natur- 
ally. Of course the underlying structure 'a space with a metric and a 
measure' arises in many contexts, for example (under the name met- 
ric measure space) in the context of differential geometry questions 
i3- 

Acknowledgments I thank Persi Diaconis, Jim Pitman and an anon- 
ymous referee for very helpful comments. 
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